Lobish: Symbolic Language for Interpreting Electroencephalogram Signals in Language Detection Using Channel-Based Transformation and Pattern

Electroencephalogram (EEG) signals contain information about the brain’s state as they reflect the brain’s functioning. However, the manual interpretation of EEG signals is tedious and time-consuming. Therefore, automatic EEG translation models need to be proposed using machine learning methods. In this study, we proposed an innovative method to achieve high classification performance with explainable results. We introduce channel-based transformation, a channel pattern (ChannelPat), the t algorithm, and Lobish (a symbolic language). By using channel-based transformation, EEG signals were encoded using the index of the channels. The proposed ChannelPat feature extractor encoded the transition between two channels and served as a histogram-based feature extractor. An iterative neighborhood component analysis (INCA) feature selector was employed to select the most informative features, and the selected features were fed into a new ensemble k-nearest neighbor (tkNN) classifier. To evaluate the classification capability of the proposed channel-based EEG language detection model, a new EEG language dataset comprising Arabic and Turkish was collected. Additionally, Lobish was introduced to obtain explainable outcomes from the proposed EEG language detection model. The proposed channel-based feature engineering model was applied to the collected EEG language dataset, achieving a classification accuracy of 98.59%. Lobish extracted meaningful information from the cortex of the brain for language detection.


Introduction
Language detection is an increasingly important field of research in today's global and multilingual world [1].Automatic language detection systems have critical applications in various domains [2].For instance, they are used in multilingual customer services for accurate language routing, language-based analyses of social media content, online content filtering and classification, and even in national security applications for detecting suspicious communications.While traditional language detection methods typically work with text or audio data [3][4][5], detecting language using brain signals is a novel and promising approach.
Language detection plays a crucial role in various applications across different sectors.Multilingual customer services enable the automatic routing of customer calls to representatives who speak the appropriate language, enhancing customer experience and operational efficiency [6].For social media platforms and content providers, language detection facilitates automatic categorization and analyses of content in different languages, allowing for better content management and targeted marketing [7].In the field of machine translation, accurate language detection significantly improves the efficiency and accuracy of translation systems [8].Libraries and archives benefit from language detection for the automatic classification and indexing of multilingual documents, making vast collections more accessible [9].Educational technologies leverage language detection to identify the user's native language and the language being learned, personalizing the learning experience in language learning applications [10].Moreover, in the realm of cybersecurity, language detection aids in the analysis of suspicious communications, contributing to threat detection and prevention efforts [11].These diverse applications underscore the importance of accurate and efficient language detection systems in our increasingly interconnected and multilingual world.
In recent years, the analysis of brain signals using brain-computer interfaces (BCIs) has become very popular [12].An electroencephalogram (EEG) measures the electrical activities of brain waves, individuals' cognitive, emotional, linguistic, and neurological states [13,14].EEG signals are generally used for a diagnosis in the field of neurology to detect schizophrenia, depression, and Parkinson's disease [15].Of the recent advances in artificial intelligence, they have significantly contributed to the automatic classification of EEG signals [16,17].Many artificial intelligence-based models have been developed to capture hidden patterns in EEG signals and successfully perform various classification tasks in BCI-based systems [18].Language is a basic tool with its own rules and structures that people use to communicate and express themselves [19].Every written and spoken language has its own structure, words, syllables, and accents [20].Therefore, there are differences in brain signals when a native speaker encounters a sentence that they have seen or heard in their own language.Automatic language detection is a process that aims to identify the language a person is thinking or speaking through EEG data [21].In this way, it aims to understand the neuroscientific basis of language learning processes and to develop new solutions in areas such as language therapy, language learning, and language education [22,23].
In this work, a new machine learning model has been developed for automatic language detection using EEG signals.To test the model, EEG signals were collected from Arabic and Turkish native speakers.From these collected signals, a feature vector is extracted using a new feature extractor called ChannelPat and the dimensionality of the feature vector is reduced using the INCA method.In the last stage of the model, a new ensemble kNN method is proposed and the collected signals are classified.The results show that EEG signals and the ChannelPat method can be used for automatic language detection with high accuracy and precision.
Recognizing the need for explainable models, we introduced Lobish, a symbolic language designed to provide interpretable results by encoding and interpreting EEG signals in a clear and structured manner.Lobish uses four letters, each representing a different brain lobe, enabling neuroscientific explanations of brain activity during language processing.
Given the multi-channel nature of EEG signals, we proposed two specific channel-based models to enhance classification performance.These models take advantage of the distinct patterns observed across different channels, leading to more accurate language detection.
To test the proposed models, we created a new EEG language dataset by collecting signals from native speakers of Arabic and Turkish.This dataset was essential in evaluating the effectiveness of our approach.

Literature Review
Language detection is not one of the most frequently studied topics in the literature [24].Especially in this field, there is no dataset available.Existing EEG and language-based studies focus on letter/syllable/word/sentence recognition and native speaker detection.
In this context, Table 1 summarizes the studies that use EEG and language-based machine learning models.As shown in Table 1, studies in the literature on language using EEG signals are mostly focused on word/sentence recognition [26,28,32].Only a limited number of native language recognition studies are available [24,25].This work presents a new contribution to the literature and develops a native speaker detection model.

Literature Gaps
The research gaps based on the literature review are given below:

•
Deep learning models are widely employed by the researchers.As a result, many deep learning-based models are used to classify the EEG signals with high classification performance.However, these deep learning models often have high time complexities [34].• There is a limited number of explainable models in this area.Most models have focused solely on classification performance, neglecting the interpretability of the results.

•
In feature engineering, there are few specialized classification models.Most researchers have generally relied on well-known classifiers.

Motivation and the Proposed Feature Engineering Model
In this work, we aimed to address the above-cited research gaps in the field of EEG signal classification.Firstly, while deep learning models are widely used for automatically classifying EEG signals, due to their high classification performance, they often suffer from high time complexities.To tackle this issue, we proposed a feature engineering model designed to perform accurate classification with linear time complexity.This model consists of three phases: (i) feature extraction, (ii) feature selection, and (iii) classification.Our aim is to provide an efficient alternative to deep learning models, which have exponential time complexity due to the training of millions of parameters.
Secondly, there is a significant gap in the availability of explainable models in EEG signal analyses.Most existing models focus solely on classification performance, neglecting the interpretability of the results.To address this, we developed a new symbolic language, termed Lobish, to provide explainable results.Lobish introduces a method for encoding and interpreting EEG signals, facilitating the analysis of brain activity in a clear and structured manner.By using four letters, each representing different brain lobes, Lobish enables neuroscientific explanations for the observed activities.
Thirdly, in feature engineering, there is a lack of specialized classification models tailored for EEG signals.Given that EEG signals consist of multiple channels, we proposed two channel-based models specifically designed to improve classification performance.Additionally, we introduced a specific ensemble model that was applied to the k-nearest neighbor (kNN) algorithm, a well-known and simple distance-based classifier, to achieve high classification accuracy.
We collected a new EEG language dataset from participants whose mother languages are Turkish and Arabic to develop the proposed model.

Novelties and Contributions
Innovations: • We have proposed a new channel-based transformation model that encodes the signals using the channel indices.• A new channel-based feature extraction function, termed ChannelPat, has been pro- posed in this work.• An EEG language dataset was collected for this work.
• The tkNN classifier has been proposed to achieve higher classification performance.
• A channel-based feature engineering model has been presented to demonstrate the classification ability of the proposed channel-based methods.

•
Lobish is a new-generation explainable result generator and a symbolic language. Contributions: • A novel EEG language dataset was collected.We used two data collection strategies: (1) listening and (2) demonstration.

•
By proposing Lobish, we obtained explainable results related to the cortical area.

Method
This section introduces the proposed new methods designed to achieve classification and explainable results from the EEG language dataset.The methods include (i) channelbased transformation, (ii) ChannelPat, (iii) tkNN, (iv) Lobish, and (v) the overall EEG language feature engineering model.

Channel-Based Transformation
The aim of the channel-based transformation is to convert the multi-channel EEG signals into a format that allows for the analysis of the relationships and patterns between the different EEG channels.This transformation is a crucial preprocessing step that simplifies the raw EEG data and makes it more suitable for subsequent feature extraction and classification processes.
The objectives of the channel-based transformation are as follows: The raw EEG signals are recorded across multiple channels, with each channel representing electrical activity from a different region of the brain.The raw data can be complex and challenging to work with directly.
The channel-based transformation simplifies these data by encoding it into a sequence of channel indices, making it easier to analyze and process.
During the transformation, the EEG signal values from all channels at each time point are sorted.The channels are then ranked based on their activity levels (i.e., signal strength).
This ranking provides insight into which brain regions (represented by channels) are most active at any given time, which is crucial for understanding brain dynamics.
The transformation converts the multi-channel EEG signal into a uniform sequence of indices representing the relative activity of each channel.This uniform representation makes it easier to apply machine learning techniques, as it standardizes the input data format.
The transformed signal, now coded from 1 to 14 (for a 14-channel EEG system), is ready for further processing.This step lays the foundation for methods like ChannelPat, which will extract meaningful features from these transformed data.
By encoding the data into indices, it becomes possible to focus on the transitions between channels, which can be indicative of brain state changes.
The presented channel-based transformation aims to convert raw EEG signals into a simplified, standardized format that captures the relative activity of the different channels.This transformation is essential for enabling more effective feature extraction and ultimately improving the performance of classification models in analyzing EEG data.This method is simple and its pseudocode is given in Algorithm 1.

Channel Pattern
For this work, a new feature extraction function has been proposed.This function has been utilized with the channel-based transformation.In this feature extraction function, we coded the channel transition to create a map signal.After that, the histogram of the created map signal was extracted and used as the feature vector.
The primary goal of ChannelPat is to capture the transitions between EEG channels, which are indicative of the dynamic interactions between different brain regions.By analyzing the activity shifts from one channel to another, ChannelPat can reveal patterns associated with specific brain states or cognitive processes.
ChannelPat converts the sequential transitions between channels into a "map signal."This map signal serves as a representation of the EEG data that emphasizes the temporal relationships between channels, rather than just their individual activities.
The steps of this model are given below.

•
S1: Apply the channel-based transformation to the signal.• S2: Divide the transformed into the overlapping block with a length of 2.
Herein, p t is the overlapping block with a length of two.We have used these blocks to show transition.

•
S3: Create the map signal by deploying base 14 to decimal conversion.
• S4: Extract the histogram of the generated map signal.
where f eat is the feature vector with a length of 196 (=14 2 ) and θ(.) is the histogram extraction function.
The four steps outlined above define the proposed ChannelPat feature extraction function.By using transitions between channels as the basis for feature extraction, ChannelPat contributes to the interpretability of the model.The features it generates can be linked back to specific brain region interactions, which can be crucial for understanding the underlying neural mechanisms and for providing explainable results in neuroscience research.

tkNN
An innovative ensemble classifier has been used in this research, termed tkNN since the t algorithm has been implemented using the kNN [35] classifier.The proposed t algorithm uses a classifier.By changing the hyperparameters of the classifier, additional classifier-based outcomes are created.After that, iterative majority voting (IMV) is applied to these classifier-wise outcomes to create the voted outcomes.In the last phase of the t algorithm, the most accurate outcome is selected as the final outcome.
The primary objective of tkNN is to improve classification accuracy over traditional kNN by exploring a broader range of hyperparameters, such as distance metrics, distance weights, and the number of nearest neighbors (k).By systematically varying these parameters, tkNN generates multiple classifier outcomes, allowing for the selection of the most accurate result.
By implementing a systematic approach to hyperparameter tuning, tkNN aims to automate the selection of the best combination of k values, distance metrics, and weights.This approach ensures that the final model is optimized for performance without relying on manual tuning, which can be time-consuming and less effective.
In this work, the t algorithm has been applied to kNN.We have iteratively changed distances, distance weights, and k values to create more outcomes.In the distance category, L1-norm and L2-norm (Manhattan and Euclidean) have been used.For distance weights, equal, inverse, and squared inverse parameters have been used.Finally, we used values from 1 to 10 for the k value of the kNN.In this aspect, outcomes have been created with the value of 60 (=2 distances × 3 distance weights × 10 k values).Moreover, 58 additional voted outcomes have been generated using IMV [36].In the last phase, the outcome with maximum accuracy is chosen as the final result.
To better explain the recommended tkNN classifier, the graphical explanation of this classifier is depicted in Figure 1.
where  is the classifier-wise outcome, (.) is the kNN classifier, and  is the real outcome.
• S2: Apply IMV to the classifier-based outcomes.The mathematical definitions of the IMV algorithm have been given below.
() = ( , ) (5) where c is the classifier-wise outcome, kNN(.) is the kNN classifier, and y is the real outcome.• S2: Apply IMV to the classifier-based outcomes.The mathematical definitions of the IMV algorithm have been given below.
Herein, cacc is the classification accuracy, ρ(., .) is the classification accuracy calculation function, ix are the sorted indices, ϖ(.) is the mode function, and v is the voted outcome.

•
S3: Choose the final outcome by deploying a greedy algorithm.
[maksi, xi] = max(cacc) (6a) Here, maksi is the maximum value, xi is the index of the maximum accuracy, and f inout is the final outcome.
The tkNN model explores a wide range of configurations (up to 118 different outcomes), offering a comprehensive analysis of the potential performance of various kNNbased classifiers.This extensive exploration is intended to ensure that the best possible model is selected based on empirical evidence.
The proposed tkNN classifier optimizes classification accuracy, and increases the robustness by using IMV and providing automatic hyperparameter tuning.These make tkNN a powerful classifier.

Lobish
In this study, we proposed an explainable EEG classification method.Therefore, we used the selected features to create a sentence in a new language named Lobish.Lobish is a new-generation method for encoding and interpreting brain activities designed to interpret EEG signals.It uses a symbolic language with four letters corresponding to the four lobes of the brain: F (frontal lobe), O (occipital lobe), P (parietal lobe), and T (temporal lobe).The meanings of these letters are

•
F demonstrates cognitive functions.

•
P represents sensory processing and spatial awareness.• T involves auditory processing and memory.

•
O indicates visual processing.
Using the defined 4-letter transitions, 16 (=2 4 ) transitions have been generated, and the translation of the words with two letters in Lobish is also explained below.

•
FF defines sustained cognitive effort.• TT indicates continuous auditory processing or engaging with memory recall.

•
PP depicts ongoing sensory integration and spatial processing.

•
OO defines continuous visual processing.

•
FT defines the transition from planning or decision-making to recalling information or understanding spoken language.• FP represents moving from cognitive tasks to integrating sensory information.

•
FO depicts the transition from planning or thinking to analyzing visual information.
• TP uses auditory information or memory to assist in sensory processing.
• TO defines recalling visual memories or interpreting visual information based on auditory input.

•
PO represents integrating sensory and spatial information with visual processing.• TF uses auditory or memory information for planning or decision-making.

•
PF defines transitioning from sensory information to cognitive tasks.

•
OF uses visual information for cognitive processes.

•
PT integrates sensory information with memory or auditory processing.

•
OT defines associating visual stimuli with memory recall or auditory information.

•
OP represents using visual information for sensory and spatial awareness.
Using this language, we obtained results explained by the proposed channel-based model.Moreover, Lobish provides an interpretation of lobe transitions.The steps are given below to better explain the channel-based feature engineering model.

•
Step 1: Apply channel-based transformation to the EEG signal.
Herein,  is the transformed signal and  is the proposed channelbased transformation.Herein,  is the input EEG signal with 14 channels.

•
Step 2: Extract features by deploying the proposed ChannelPat.The steps are given below to better explain the channel-based feature engineering model.

•
Step 1: Apply channel-based transformation to the EEG signal.
CS = trans f ormer(signal) Herein, CS is the transformed signal and trans f ormer is the proposed channel-based transformation.Herein, signal is the input EEG signal with 14 channels.

•
Step 2: Extract features by deploying the proposed ChannelPat.
where f v is the feature vector and CP(.) is the proposed ChannelPat.Herein, the length of the feature vector is computed as 196.In this step, 196 (=14 14) features have been extracted since the used EEG signal dataset has 14 channels.

•
Step 3: Repeat Steps 1-2 until the number of the signals is reached and a feature matrix is created.
Steps 1-3 have been defined as the proposed feature extraction method of the presented feature engineering model.Moreover, Figure 3 represents the proposed channel-based transformation and ChannelPat.

•
Step 4: Apply the INCA [37] feature selector to choose the most informative features.
where s f is the selected feature vector, I NCA(.) is the INCA feature selection function, and X is the created feature matrix.By utilizing the selected features, both classification and explainable results have been obtained.

•
Step 4 defines the feature selection phase of the proposed feature engineering model.

•
Step 5: Classify the selected feature vector by deploying the tkNN classifier. = (, ) where  is the selected feature vector, (.) is the INCA feature selection function, and  is the created feature matrix.By utilizing the selected features, both classification and explainable results have been obtained.

•
Step 7: By utilizing the extracted Lobish sentences, obtain explainable results.These sentences, composed of Lobish symbols, provide a structured interpretation of the brain's activity, translating complex neural processes into a symbolic language.This approach enables a deeper understanding of the EEG data by linking specific brain lobe transitions to cognitive functions, thereby facilitating both the precise classification and meaningful, interpretable explanations of the observed brain dynamics.In this step, histograms of the symbols and transition tables of the symbols have been computed.

Experimental Results
This section presents the results of the proposed channel-based EEG language detection model.To implement this model, MATLAB (version 2024a) was used.Firstly, the EEG segments were gathered from participants and segmented, and each signal was stored as a .matfile.
After that, the proposed channel-based model was programmed using various functions: channel transformation, the presented ChannelPat, the INCA feature selection function, and the tkNN classifier.We coded these functions using .mfiles.The proposed model was implemented using the CPU mode since this model is a feature engineering model with linear time complexity.The parameters of the recommended model are tabulated in Table 2. Using the parameters listed in Table 2, we have implemented the proposed channelbased feature engineering model.This model was designed to solve a binary classification problem, as there are two classes: (i) Arabic and (ii) Turkish.Therefore, classification accuracy, sensitivity, specificity, precision, F1-score, and geometric mean performance evaluation parameters have been used.The confusion matrix of the results was used to compute these performance evaluation parameters, and the computed confusion matrix of the proposed model is shown in Figure 4.Moreover, the performance evaluation metrics are tabulated in Table 3.
Diagnostics 2024, 14, x FOR PEER REVIEW 13 of 22 Using the parameters listed in Table 2, we have implemented the proposed channelbased feature engineering model.This model was designed to solve a binary classification problem, as there are two classes: (i) Arabic and (ii) Turkish.Therefore, classification accuracy, sensitivity, specificity, precision, F1-score, and geometric mean performance evaluation parameters have been used.The confusion matrix of the results was used to compute these performance evaluation parameters, and the computed confusion matrix of the proposed model is shown in Figure 4.Moreover, the performance evaluation metrics are tabulated in Table 3.As can be seen in Table 3, all performance metrics have attained over 98%.
The second performance evaluation parameter is the time complexity [38].The big O notation has been used to compute the time complexity of the proposed model.
Channel-based transformation: The time complexity of this transformation is O(N), where N is the length of the signal.As can be seen in Table 3, all performance metrics have attained over 98%.The second performance evaluation parameter is the time complexity [38].The big O notation has been used to compute the time complexity of the proposed model.
Channel-based transformation: The time complexity of this transformation is O(N), where N is the length of the signal.
ChannelPat: ChannelPat only uses the conversion from base 14 to a decimal number and extracts a histogram.In this case, the time complexity of this feature extraction function is O(N).
INCA: INCA is an iterative feature selector.Thus, its time complexity is O(S + RC), where S is the time complexity of the NCA feature selector, R is the range of the iteration, and C is the time complexity of the kNN classifier.
tkNN: tkNN is an iterative ensemble classifier.The time complexity is O(PC + V + P), where P is the tested number of parameters or generated predictions, and V is the number of voted outcomes.
Therefore, the time complexity of the proposed channel-based model is equal to O(N + S + RC + PC + V).This result demonstrates that the proposed channel-based EEG signal classification model has linear time complexity.
In this work, we have presented a new channel-based feature engineering model.Specifically, we introduced a transformer for EEG signals that uses the differences in EEG channels to create new-generation features.In this transformer, the INCA feature selector has been used to select the most informative 144 features out of 196 features.Using these selected features, we created a Lobish sentence with a length of 288 (=144 × 2), as each feature of our model has been coded with two channels.The resulting string in Lobish is "TFTTPOOPPTFFFFPOFTFFOTTFFFFTFTOOFFFFOOPOFFOTPTFOOFFFFFTFFFFFF FTFPPFFPFTOTPFFFFTPTFTPPFTFPOFTFFOFOFFFTOFFFFOFTTOPPFFPFFFFFPOPTFFP FFOFFOTOPFFTPFFOFPPFFFPFFPOFFPFOOPTFFTFFTPOTFPFPTFTOFFPTPTFFFFPFFFP FOTFFFTFFFFTFFOFFPFFFFFOFTFOFFTFFFTFFFFFOFFFTFPFPFFFFPFOFFPPFFOFOFTP FFFPFFFFFFOTFFPFPFOFTFOFO".
The translation of the above sentence is given below: • TFTT: Temporal lobe activity indicating memory and auditory processing, and brief cognitive processing, then back to the temporal lobe.OOPOFF: Occipital to parietal to frontal transition, indicating visual information moving through sensory integration to cognitive processing.

•
OTPTFOO: This complex transition indicates multiple integrations between the occipital, temporal, parietal, and frontal lobes, suggesting intense processing of sensory, memory, and cognitive information.

•
FFFFFT: Sustained cognitive effort with a brief switch to the temporal lobe for memory recall.

•
FFFFFFT: Continued high-level cognitive processing with brief temporal involvement.

•
FP: Simple transition from frontal to parietal lobes, showing cognitive effort translated into sensory integration.• PFFO: Sensory information is being processed back into cognitive effort and then visual processing.
the moderate frequencies of transitions like TP (9), PT (8), PO (7), OT (7), TO (6), and OP ( 6) indicate a balanced but less frequent integration of sensory and visual information with cognitive functions.The lower frequencies of TT (3) and PP (6) suggest limited continuous auditory and sensory processing within their respective lobes independently.Moreover, the Shannon entropy of the transitions has been computed as 3.3986.This value is close to the maximum entropy of 4 (=log 2 16).Therefore, this entropy demonstrates that the language detection process is a complex process.
the integration of sensory information.Figure 5a shows 158 frontal, 44 temporal, 41 occipital, and 45 parietal lobe activities.We used demonstration and listening methods to obtain EEG signals during the data collection phase.The obtained Lobish sentence validated this process since the entropy of this sentence, according to Figure 5, is equal to 1.7081.This entropy is relatively high, close to 2 (log₂ 4).This indicates that language detection is a complex process for the cortex.Moreover, the high frontal lobe activity suggests significant cognitive processing.In contrast, the involvement of other lobes highlights the integration of sensory, auditory, and visual information necessary for comprehensive language detection and understanding.
According to Figure 5 (b), the transition frequencies are FF (86), TT (3), PP (6), OO (5), FT (25), FP (24), FO (23), TP (9), TO (6), PO (7), TF (26), PF (24), OF (22), PT (8), OT (7), and OP (6).The dominant transition, FF, with a frequency of 86, indicates a significant and sustained cognitive effort within the frontal lobe.The high frequencies of transitions involving the frontal lobe, such as TF (26), FT (25), FP (24), PF (24), FO (23), and OF (22), suggest that the cognitive processes frequently interact with memory recall, sensory integration, and visual processing.These interactions showcase the complexity of the task, Moreover, the transition matrix of the characters for the generated words has been computed, and this matrix is demonstrated in Figure 6.  Figure 6b denotes that the F state is dominant: the probability of remaining in the F state is higher than other states, indicating that the F state is relatively stable.The T state is transient: the probability of remaining in the T state is very low, suggesting that the T state quickly transitions to other states.The P and O states are also transient: the probabilities of remaining in the P and O states are low, with a higher likelihood of transitioning to the F state.
These analyses help us understand the system's dynamics using the transition matrix data.Such an analysis can be used to model and understand transition probabilities in EEG signals or other physiological signals.Figure 6b denotes that the F state is dominant: the probability of remaining in the F state is higher than other states, indicating that the F state is relatively stable.The T state is transient: the probability of remaining in the T state is very low, suggesting that the T state quickly transitions to other states.The P and O states are also transient: the probabilities of remaining in the P and O states are low, with a higher likelihood of transitioning to the F state.

Discussions
These analyses help us understand the system's dynamics using the transition matrix data.Such an analysis can be used to model and understand transition probabilities in EEG signals or other physiological signals.

Discussions
We compared the proposed model with the commonly used local binary pattern (LBP) [39], local ternary pattern (LTP) [40], statistical feature extractor (SF) [41], hexadecimal local pattern (HLP) [42], and Pascal's triangle lattice pattern (PTLP) [43].The results obtained using these methods and the proposed channel-based model are below.We used these feature extraction functions (LBP, LTP, SF, HLP, and PLTP) along with the INCA feature selector and kNN classifier to evaluate the classification performance of various methods on our dataset.Other methods do not include a channel-based transformation method.Therefore, we have used the results from the most accurate channel (the first).The computed classification accuracies for various feature extraction functions are given in Figure 7.
obtained using these methods and the proposed channel-based model are below.We used these feature extraction functions (LBP, LTP, SF, HLP, and PLTP) along with the INCA feature selector and kNN classifier to evaluate the classification performance of various methods on our dataset.Other methods do not include a channel-based transformation method.Therefore, we have used the results from the most accurate channel (the first).The computed classification accuracies for various feature extraction functions are given in Figure 7.To demonstrate the effect of the proposed channel-based transformation, we applied our EEG transformer to the three least accurate feature extractors: LBP, LTP, and SF.The obtained results are shown in Figure 8.To demonstrate the effect of the proposed channel-based transformation, we applied our EEG transformer to the three least accurate feature extractors: LBP, LTP, and SF.The obtained results are shown in Figure 8.The proposed channel-based transformation increased the classification accuracies of the SF, LBP, and LTP feature extractors from 58.59%, 71.74%, and 68.81% to 62.19%, 93.07%, and 95.96%, respectively.
The last step of the proposed model is the tkNN method.We have compared the tkNN classifier with ensemble kNN and the results are shown in Figure 9.The proposed channel-based transformation increased the classification accuracies of the SF, LBP, and LTP feature extractors from 58.59%, 71.74%, and 68.81% to 62.19%, 93.07%, and 95.96%, respectively.
The last step of the proposed model is the tkNN method.We have compared the tkNN classifier with ensemble kNN and the results are shown in Figure 9.
The last step of the proposed model is the tkNN method.We have compared the tkNN classifier with ensemble kNN and the results are shown in Figure 9.The ensemble kNN of the MATLAB classification learner tool attained a 97.37% classification accuracy for our selected feature vector, while the proposed tkNN yielded a 98.59% classification accuracy.
The salient features of this work are given below: • A novel channel-based transformation function was proposed to obtain higher classification accuracy than the traditional feature extractors like LBP, LTP, and SF.

•
The ChannelPat feature extraction function used channel-based transformation to create a map signal, achieving a high classification accuracy of 98.59%.

•
The innovative tkNN classifier outperformed the ensemble kNN tool, achieving a classification accuracy of 98.59% compared to 97.37%.The ensemble kNN of the MATLAB classification learner tool attained a 97.37% classification accuracy for our selected feature vector, while the proposed tkNN yielded a 98.59% classification accuracy.
The salient features of this work are given below: • A novel channel-based transformation function was proposed to obtain higher classifi- cation accuracy than the traditional feature extractors like LBP, LTP, and SF.• The ChannelPat feature extraction function used channel-based transformation to create a map signal, achieving a high classification accuracy of 98.59%.• The innovative tkNN classifier outperformed the ensemble kNN tool, achieving a classification accuracy of 98.59% compared to 97.37%.

•
Lobish, a new symbolic language, was introduced to obtain explainable results.• The proposed channel-based feature engineering model attained over 98% classifica- tion performance.• This model is a lightweight EEG language detection model since this model has linear time complexity.• A new EEG language detection dataset was collected, and this dataset includes two languages, which are (1) Arabic and (2) Turkish.

•
Lobish has identified the necessity of integrating sensory, auditory, and visual information and high frontal lobe activity for language detection.

•
By translating EEG signals into symbolic representations, Lobish has provided deeper insights into the neural processes underlying language perception and processing, paving the way for advanced research in neuroscience and cognitive science.• The ability to generate Lobish sentences from EEG data opens up new avenues/ways for exploring how different brain regions interact during specific tasks, providing insights that were previously difficult to obtain.

•
Lobish serves as a bridge between neuroscience, cognitive science, and artificial intelligence.Its symbolic nature makes it accessible to researchers from different disciplines.

•
Lobish can be used to develop personalized learning strategies that align with a student's cognitive strengths and weaknesses, optimizing learning outcomes.• The creation of Lobish represents a shift towards a more human-centric approach in EEG analyses.

•
Lobish has the potential to transform the way EEG data are used in both research and practical applications, making brain-computer interaction more intuitive and accessible.

Figure 1 .
Figure 1.The schematic block diagram of the proposed tkNN classifier.Here, out is the parameterbased outcome and vot is the voted outcome.The steps of the tkNN classifier are given below.

2. 2
.5.Proposed Feature Engineering Model A new feature engineering model has been proposed to investigate the channel-based transformation and ChannelPat.The proposed feature engineering model has four phases: (1) feature extraction, (2) feature selection, (3) classification, and (4) explainable result generation.The graphical demonstration of the proposed feature engineering model is shown in Figure 2. Diagnostics 2024, 14, x FOR PEER REVIEW 10 of 22 (1) feature extraction, (2) feature selection, (3) classification, and (4) explainable result generation.The graphical demonstration of the proposed feature engineering model is shown in Figure 2.

Figure 2 .
Figure 2. The graphical depiction of the presented explainable feature engineering model.

Figure 2 .
Figure 2. The graphical depiction of the presented explainable feature engineering model.

res = tkNN(s f , y) ( 10 ) 22 Figure 3 .
Figure 3. Schematic diagram of channel transformer and ChannelPat (feature extraction of proposed model).c: channel values, k: number of channels, n: length of signal, q: transformed values.•Step4: Apply the INCA[37] feature selector to choose the most informative features.

Figure 3 .
Figure 3. Schematic diagram of channel transformer and ChannelPat (feature extraction of proposed model).c: channel values, k: number of channels, n: length of signal, q: transformed values.
• POO: Transition from parietal to occipital lobes, indicating sensory processing moving into visual processing.• PPT: Parietal to temporal transition indicating the integration of sensory information with memory.• FFFF: Sustained frontal lobe activity indicating prolonged cognitive effort and planning.• POF: Parietal to occipital to frontal transition, indicating sensory and visual information being integrated into cognitive processes.• TFF: Temporal to frontal transition indicating memory recall being used for planning or decision-making.• OTT: Occipital to temporal transition, indicating visual information processing leading to memory recall or auditory processing.• FFFF: Repeated frontal lobe activity, reinforcing cognitive effort.• TFTOO: Temporal to frontal transition with sustained occipital activity, showing memory integration with visual processing.• FFFF: Continued cognitive effort in the frontal lobe. •

Figure 6 .
Figure 6.Character transition probability matrix for generated word.

Figure
Figure6aindicates that the F state is dominant.The probability of remaining in the F state is higher than in other states, indicating that the F state is relatively stable.The T state is transient: the probability of remaining in the T state is very low, suggesting that the T state quickly transits to other states.The P and O states are also transient: the probabilities of remaining in the P and O states are quite low, indicating that these states change quickly.Figure6bdenotes that the F state is dominant: the probability of remaining in the F state is higher than other states, indicating that the F state is relatively stable.The T state is transient: the probability of remaining in the T state is very low, suggesting that the T state quickly transitions to other states.The P and O states are also transient: the probabilities of remaining in the P and O states are low, with a higher likelihood of transitioning to the F state.These analyses help us understand the system's dynamics using the transition matrix data.Such an analysis can be used to model and understand transition probabilities in EEG signals or other physiological signals.

Figure 6 .
Figure 6.Character transition probability matrix for generated word.

Figure
Figure6aindicates that the F state is dominant.The probability of remaining in the F state is higher than in other states, indicating that the F state is relatively stable.The T state is transient: the probability of remaining in the T state is very low, suggesting that the T state quickly transits to other states.The P and O states are also transient: the probabilities of remaining in the P and O states are quite low, indicating that these states change quickly.Figure6bdenotes that the F state is dominant: the probability of remaining in the F state is higher than other states, indicating that the F state is relatively stable.The T state is transient: the probability of remaining in the T state is very low, suggesting that the T state quickly transitions to other states.The P and O states are also transient: the probabilities of remaining in the P and O states are low, with a higher likelihood of transitioning to the F state.These analyses help us understand the system's dynamics using the transition matrix data.Such an analysis can be used to model and understand transition probabilities in EEG signals or other physiological signals.

Figure 7 .
Figure 7. Graph of accuracy (%) obtained versus various feature extraction functions including proposed method.

Figure 7
Figure 7 shows that the best other model is the PLTP feature extraction function, which attained 80.28% classification accuracy.In comparison, our model reached 98.59% classification accuracy.To demonstrate the effect of the proposed channel-based transformation, we applied our EEG transformer to the three least accurate feature extractors: LBP, LTP, and SF.The obtained results are shown in Figure8.

Figure 7 .
Figure 7. Graph of accuracy (%) obtained versus various feature extraction functions including proposed method.

Figure 7
Figure 7 shows that the best other model is the PLTP feature extraction function, which attained 80.28% classification accuracy.In comparison, our model reached 98.59% classification accuracy.To demonstrate the effect of the proposed channel-based transformation, we applied our EEG transformer to the three least accurate feature extractors: LBP, LTP, and SF.The obtained results are shown in Figure8.

Figure 8 .
Figure 8.The summary of the effect of the channel-based transformation on the results obtained.

Figure 8 .
Figure 8.The summary of the effect of the channel-based transformation on the results obtained.

Table 1 .
Summary of state-of-the-art algorithms employed for language/word detection using EEG signals.

•
We have proposed a new feature engineering model.Two EEG-specific models have been used: channel-based transformation and feature extraction.

•
Step 6: Extract Lobish symbols by utilizing the indices of the selected features.In this work, we employed a transition table-based feature extraction function, where each selected feature represents a transition between two EEG channels.Consequently, each selected feature corresponds to two Lobish symbols, which are derived based on the specific transitions between the brain lobes represented by those channels.The Lobish symbols provide a symbolic interpretation of the brain's activity, allowing for both detailed classification and explainable results, as they offer insights into the underlying neural processes associated with the observed EEG patterns.The pseudocode of this step is given in Algorithm 2. Pseudocode of the proposed Lobish sentence generation method.

Table 2 .
Parameters used for the recommended feature engineering.

Table 3 .
The summary of the results obtained for the recommended channel-based feature engineering model.

Table 3 .
The summary of the results obtained for the recommended channel-based feature engineering model.